Learning to represent reward structure: a key to adapting to complex environments.

نویسندگان

  • Hiroyuki Nakahara
  • Okihide Hikosaka
چکیده

Predicting outcomes is a critical ability of humans and animals. The dopamine reward prediction error hypothesis, the driving force behind the recent progress in neural "value-based" decision making, states that dopamine activity encodes the signals for learning in order to predict a reward, that is, the difference between the actual and predicted reward, called the reward prediction error. However, this hypothesis and its underlying assumptions limit the prediction and its error as reactively triggered by momentary environmental events. Reviewing the assumptions and some of the latest findings, we suggest that the internal state representation is learned to reflect the environmental reward structure, and we propose a new hypothesis - the dopamine reward structural learning hypothesis - in which dopamine activity encodes multiplex signals for learning in order to represent reward structure in the internal state, leading to better reward prediction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Chaotic Genetic Algorithm based on Explicit Memory with a new Strategy for Updating and Retrieval of Memory in Dynamic Environments

Many of the problems considered in optimization and learning assume that solutions exist in a dynamic. Hence, algorithms are required that dynamically adapt with the problem’s conditions and search new conditions. Mostly, utilization of information from the past allows to quickly adapting changes after. This is the idea underlining the use of memory in this field, what involves key design issue...

متن کامل

Modelling Motivation as an Intrinsic Reward Signal for Reinforcement Learning Agents

Reinforcement learning agents require a learning stimulus in the form of a reward signal in order for learning to occur. Typically, this reward signal makes specific assumptions about the agent’s external environment, such as the presence of certain tasks which should be learned or the presence of a teacher to provide reward feedback. For many complex, dynamic environments, design time knowledg...

متن کامل

A Framework for Adapting Population-Based and Heuristic Algorithms for Dynamic Optimization Problems

In this paper, a general framework was presented to boost heuristic optimization algorithms based on swarm intelligence from static to dynamic environments. Regarding the problems of dynamic optimization as opposed to static environments, evaluation function or constraints change in the time and hence place of optimization. The subject matter of the framework is based on the variability of the ...

متن کامل

Scaffolding: A way for supporting learners in e-learning environments

Introduction: One of the effective ways to help learners improve their learning in learning environments is scaffolding. Scaffolding can be defined as teachers, other learners and resources support of learner on tasks that can’t be done alone. Experts strongly demand scaffolding to be used to help learners on abstractive and complex issues. The aim of this study is to examine the scaffoldi...

متن کامل

A Q-learning Based Continuous Tuning of Fuzzy Wall Tracking

A simple easy to implement algorithm is proposed to address wall tracking task of an autonomous robot. The robot should navigate in unknown environments, find the nearest wall, and track it solely based on locally sensed data. The proposed method benefits from coupling fuzzy logic and Q-learning to meet requirements of autonomous navigations. Fuzzy if-then rules provide a reliable decision maki...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Neuroscience research

دوره 74 3-4  شماره 

صفحات  -

تاریخ انتشار 2012